test(e2e): run gpu workloads from manifest#1709
Conversation
5cc2d92 to
efe4d25
Compare
5a84bca to
1c8f7b7
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
de40d64 to
8426fac
Compare
1c8f7b7 to
c5182b1
Compare
|
🌿 Preview your docs: https://nvidia-preview-pr-1709.docs.buildwithfern.com/openshell |
032f133 to
55ed9ce
Compare
|
Label |
|
Label |
2f36b22 to
386d638
Compare
BlockedGator is blocked by merge conflicts with the base branch. GitHub reports Next action: @elezar, please rebase or merge the base branch and resolve the conflicts, then push an updated head so gator can validate and review the PR. |
88e756c to
6c30486
Compare
Re-check After Author UpdateI re-evaluated latest head Disposition: partially resolved. Remaining items:
Docs: GPU e2e documentation and gateway config reference updates are present; no docs navigation change appears necessary. Next state: |
Re-check After CI UpdateI re-evaluated latest head Disposition: partially resolved, but not ready to leave review. Remaining items:
Next state: |
6c30486 to
0e9f9fc
Compare
Re-check After Author UpdateI re-evaluated latest head Disposition: resolved for review; CI is still in progress. Remaining items:
Docs: GPU e2e documentation is updated under Next state: |
Maintainer Approval NeededGator validation and PR monitoring are complete. Validation: maintainer-authored, project-valid GPU E2E test-harness work linked to #1472. Human maintainer approval or merge decision is now required. |
59572f2 to
a351e3c
Compare
BlockedGator is blocked from completing the required independent re-review for current head The PR also has new CI in progress for this head, so it is not ready to stay in Next action: OpenShell sandbox operator should refresh or relaunch gator with a working reviewer sub-agent, then re-run gator so the current diff can be independently reviewed and the pending checks can be reconciled. No PR author action is requested by this blocker. |
Re-check After Reviewer UpdateI re-evaluated latest head Disposition: blocker resolved, but review follow-up is still needed. Remaining items:
Checks: Docs: GPU E2E documentation is updated under Next state: |
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
a351e3c to
87ea21e
Compare
Summary
This PR adds manifest-driven GPU workload execution tests on top of the workload image artifacts from #1484. It keeps the existing GPU device-selection coverage, adds workload execution coverage under the umbrella
gputarget, and documents how to build workload images locally before running the GPU e2e suite.This branch is now rebased on the local e2e stabilization fixes from #1935, so the Docker GPU test path also includes the supervisor-image and host SSH linker-environment fixes needed for local Nix/devenv runs.
Related Issue
Closes #1472
Changes
openshell sandbox create --gpu --from <image> -- <command>and enforce declaredpassorfailexpectations.e2e/gpu/images/.build/workloads.yamlby default, withOPENSHELL_E2E_WORKLOAD_MANIFESTavailable for external manifests.serde_yamlto the e2e crate for manifest parsing.Testing
mise run pre-commitpassesValidation status:
mise run e2e:docker:gpumise run pre-commitwas run after rebasing ontomain; Rust format/check/clippy, markdown lint, Python format, license checks, and docs checks completed successfully.mise run pre-commitcurrently fails inhelm:lintbecause the local chart dependency directory is missing thepostgresqldependency. This is unrelated to the GPU workload changes.GPU validation commands for future runs:
mise run e2e:workloads:buildmise run e2e:docker:gpuNotes:
mise run e2e:workloads:buildbefore runningmise run e2e:docker:gpulocally.OPENSHELL_E2E_WORKLOAD_MANIFEST=/abs/path/to/workloads.yaml.Checklist